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ABSTRACT 

We present a new high-resolution A-body algorithm for cosmological simulations. The al¬ 
gorithm employs a traditional particle-mesh technique on a cubic grid and successive multilevel 
relaxations on the finer meshes, introduced recursively in a fully adaptive manner in the regions 
where the density exceeds a predefined threshold. The mesh is generated to effectively match an 
arbitrary geometry of the underlying density field - a property particularly important for cos¬ 
mological simulations. In a simulation the mesh structure is not created at every time step but 
is properly adjusted to the evolving particle distribution. The algorithm is fast and effectively 
parallel: the gravitational relaxation solver is approximately half as fast as the fast Fourier trans¬ 
form solver on the same number of mesh cells. The required CPU time scales with the number 
of cells, Nc, as ^ 0{Nc). The code allows us to improve considerably the spatial resolution of 
the particle-mesh code without loss in mass resolution. We present a detailed description of the 
methodology, implementation, and tests of the code. 

We further use the code to study the structure of dark matter halos in high-resolution 2h~^ 
kpc) simulations of standard CDM (D = 1, /i = 0.5, = 0.63) and ACDM (Ua = 1 — Uq = 0.7, 

h = 0.7, a% = 1.0) models. We hnd that halo density profiles in both CDM and ACDM models 
are well fitted by the analytical model presented recently by Navarro et ah, which predicts a 
singular \p{r) oc r~^] behavior of the halo density profiles at small radii. We therefore conclude 
that halos formed in the ACDM model have structure similar to CDM halos and thus cannot 
explain the dynamics of the central parts of dwarf spiral galaxies, as inferred from the galaxies’ 
rotation curves. 


Subject headings: methods: numerical - cosmology: theory - dark matter 
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1. Introduction 

-/V-body techniques are used in cosmological simulations to follow the nonlinear evolution of a system 
of particles, and to give theoretical predictions about the matter distribution that can be compared with 
observations. The traditional IV-body methods are the particle-mesh (PM), particle-particle/particle-mesh 
(P^M), and TREE methods (Hockney & Eastwood 1981; Klypin & Shandarin 1983; Efstathiou et al. 1985; 
Bouchet & Hernquist 1988, and references therein). Although numerous fundamental results have been 
obtained using these codes, the codes often cannot provide desirable spatial or mass resolution with currently 
available computers because of either memory or CPU limitations. Thus, for example, the PM code can 
handle a large number of particles (the latest PM simulations follow evolution of approximately 6 x 10^ 
particles) but is limited in spatial resolution (to increase resolution by a factor of 2 requires 8 times as much 
memory; the largest PM simulations have reached dynamic range of ~ 1500). TREE and P^M codes are 
CPU limited^ because calculation of forces in these codes is considerably slower than in the PM code and, 
in the case of the P^M code, is also strongly dependent on the degree of particle clustering. In an ideal 
cosmological simulation one needs a resolution ~ 1 — 10 kpc to resolve a galaxy and a simulation cube of 
~ 100 Mpc to sample appropriately the longest perturbation waves or to get sufficient statistics. The number 
of particles should be sufficiently large (usually a few million or larger) to allow halo properties to be reliably 
determined. The required dynamical range is thus ^ 10^ — 10®, which is higher than the above codes can 
provide for the required number of particles and with currently available computers. These limitations have 
motivated the development of new methods with better resolution and/or performance. 

Villumsen (1989) developed a code in which the PM grid was complemented by hner cubic subgrids 
to increase the force resolution in regions of interest. The local potential was calculated as a sum of the 
potentials on the subgrids and on the PM grid. A similar approach was adopted by Jessop, Duncan, & Chau 
(1994) in their particle-multiple-mesh code. However, instead of summing the potentials from subgrids, the 
potential on each level was obtained independently by solving the boundary problem. Boundary values 
of the potential were interpolated from the coarser parent grid. Couchman (1991) used cubic refinement 
grids to improve the performance of the P®M algorithm. Here, the resolution of the P®M code was retained 
while the computational speed was considerably increased. In the Lagrangian approach (Gnedin 1995; Pen 
1995) the computational mesh is not static but moves with the matter so that the resolution increases 
(smaller mesh cells) in the high density regions and decreases elsewhere. Although potentially powerful, this 
approach has its caveats and drawbacks (Gnedin & Bertschinger 1996). The mesh distortions, for example, 
may introduce severe force anisotropies. A different approach was adopted by Xu (1995), who developed the 
TPM code, a hybrid of the PM and TREE algorithms. The gravitational forces in the TPM are calculated 
via a PM scheme on the grid and via multipole expansions (TREE algorithm) in the regions where higher 
force resolution is desired. The forces on the particles in low-density regions are calculated by the PM 
scheme, while forces on the particles in high-density regions are the sum of external large-scale PM force and 
internal short-scale force from the neighboring particles. Although this code may not be faster than a pure 
TREE code, it is effectively parallel because particles in different regions can be evolved independently. An 
adaptive multigrid code for cosmological simulations was recently presented by Suisalu & Saar (1995). In this 
code, finer rectangular subgrids are adaptively introduced in regions where the density exceeds a specified 
threshold. For each subgrid, the potential is calculated using boundary conditions interpolated from the 
coarser grid. The solution on the hner grid is used to improve the solution on the coarser grid. Another 


Although in the case of relatively small number of particles, Np < 256®, the TREE and P®M codes can provide higher 
dynamic range than the PM code 
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variant of an adaptive particle-multiple-mesh code for cosmological simulations was recently presented by 
Gelato, Chernoff, & Wasserman (1996). This code can handle isolated boundary conditions, which makes it 
applicable to noncosmological problems. 

All of the above multigrid methods use rectangular subgrids to increase force resolution. For simulations 
where there are only a few small regions of interest (e.g., a few galaxies or clusters of galaxies) the rectangular 
refinements may be a good choice because these regions can be easily covered by rectangular subgrids. It is, 
however, well known that the geometry of structures in realistic cosmological models is usually a complicated 
network of sheets, filaments, and clumps which are difficult to cover efficiently with rectangular grids. 

In this paper we present a new Adaptive Refinement Tree (ART) high-resolution A^-body code. This code 
was developed to improve the spatial resolution of a particle-mesh code by about two orders of magnitude 
without loss of mass resolution or computational speed. In our scheme, the computational volume is covered 
by a cubic grid that defines the minimum resolution. On this grid, the Poisson equation is solved with a 
traditional fast Fourier transform (FFT) technique using periodic boundary conditions. The finer meshes^ 
are built as collections of cubic, non-overlapping cells of various sizes organized in octal threaded trees in 
regions where the density exceeds a predefined threshold. Any mesh can be subject to further refinements; 
the local refinement process stops when the density criterion is satisfied. Once constructed, the mesh, rather 
than being destroyed at each time step, is promptly adjusted to the evolving particle distribution. To solve 
the Poisson equation on these refinement meshes, we use a relaxation method with boundary conditions and 
an initial solution guess interpolated from the previous coarser mesh. Below we present the method (§2), 
describe the code (§3), and discuss the tests (§4). We then compare the code with other algorithms (§4) and 
finally apply it to a real cosmological problem (§5). 


2. Methodology 
2.1. Adaptive mesh refinement 

Adaptive mesh refinement (AMR) techniques for solving partial differential equations (PDFs) have 
numerous applications in different fields of physics, astrophysics, and engineering in which large dynamic 
range is important. There are two major approaches in the application of these techniques. In the first 
approach (e.g., Berger & Oliger 1984; Berger 1986; Berger & Colella 1989), the computational volume is 
divided into cubic elements (cells), while in the second (e.g., Lohner & Baum 1991) the cells can have an 
arbitrary shape. Collections of cells are used as computational meshes on which the PDFs are discretized. 
We will call meshes composed of cubic cells regular, calling meshes irregular otherwise. The integration of 
PDFs is simpler on regular meshes, but dealing with complicated boundaries may be a difficult problem. 
With irregular meshes one can handle complicated boundaries much more easily. The price, however, is 
more elaborate algorithms, data structures, and associated CPU and memory overhead. A particular choice 
of the mesh structure is a tradeoff between these considerations. In astrophysics there are no complicated 
boundaries, and a cubic computational volume is usually used to model a system. In this case, there is no 
need for irregular meshes and it is preferable to use meshes made of cubic cells. 

The regular meshes themselves can be organized in different ways. The usual practice is to use regular 


^We will use the word grid to refer to cubic or rectangular configurations, reserving the word mesh for configurations of 
arbitrary shape. 
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meshes of cubic or rectangular shape (e.g., Berger & Colella 1989) organized in arrays (grids), which allows 
one to simplify data structures and to use standard PDE solvers. These arrays can be organized in a tree 
(Berger 1986) to form a multigrid hierarchy. The main disadvantage of the grids is that one cannot cover 
regions of complicated shape in an efficient way. Moreover, the arrays are an inflexible data structure, and 
the whole refinement hierarchy should be periodically rebuilt, not adjusted, when dealing with unsteady 
solutions. 

In our approach, we use regular meshes but they are handled in a completely different way. Cells are 
treated as individual units which are organized in refinement trees (see §3.2). Each tree has a root - a cell 
belonging to a base cubic grid that covers the entire computational volume. If the root is rehned (split) 
- it has eight children (smaller nonoverlapping cubic cells residing in its volume), which can be refined in 
their turn, and so on. Cells of a given refinement level are organized in linked lists and form a rehnement 
mesh. The tree data structures make mesh storage and access in memory logical and simple, while linked 
lists allow for efficient mesh structure traversals. In the current version of the code we make use of octal 
threaded trees (Khokhlov 1997) and doubly linked lists (e.g., Knuth 1968; Aho, Hopcroft, & Ulman 1983; 
Corner, Leiserson, & Rivest 1994). The fact that cells are treated as independent units rather than element 
of a grid allows us to build a very flexible mesh hierarchy which can be easily modified. The details of the 
mesh generation and modification in our code are described in §3. 


2.2. Multilevel relaxation method 

The multigrid techniques of solving partial differential equations (Brandt 1977) are very successful in 
reducing the computational and storage requirements for solving many types of PDEs (Wesseling 1992; Press 
et al. 1992). There are two kinds of multigrid algorithms. The first, sometimes called the multigrid method, 
is used to speed up convergence of relaxation methods. In this method, the source term is defined only on 
the base finest grid - all the other, coarser grids are used as a workspace. In the second algorithm, called 
full multigrid, the source term is defined on all grids, and the method obtains successive solutions on finer 
and finer grids. The latter method is useful when dealing with grids created in adaptive refinement process. 
The full multigrid scheme can be used differently in its turn, depending on how the solutions on different 
levels influence each other. In the one-way interface scheme, the solution from a coarser grid is used to get 
a first-guess solution on the finer grid and often to get boundary values as well. However, the solution on 
the coarser grid is not influenced by the solution on the finer grid (e.g., Jessop, Duncan, & Chau 1994). In 
the two-way interface scheme, the coarser grid solution is used to correct the solution on the finer grids and 
vice-versa. The choice of a particular scheme is usually determined empirically and is problem dependent. 
The two-way interface scheme is more difficult to implement in the case of periodic boundary conditions 
(Suisalu & Saar 1997). 

In our approach, each refinement mesh is composed of cells of the same refinement level, but these 
meshes are completely different from grids. The techniques are thus multilevel rather than multigrid. We 
use an analog of the full multigrid algorithm with a one-way interface between the meshes. We use a regular 
cubic grid covering the whole computational volume as the zeroth or coarsest level. At this level, the Poisson 
equation is solved using a standard EFT method with periodic boundary conditions. This solution is then 
interpolated onto the finer first-level mesh to get the boundary values and first-guess solution. Once the 
boundary problem is defined, we use a relaxation method (e.g., Press et al. 1992) to solve the Poisson 
equation on the mesh. Since we start from an initial guess which is already close to the final solution, the 
iterative relaxation procedure converges quickly. After we get the solution on the first refinement level, the 
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same procedure (obtaining boundary values and initial guess by interpolation from the previous coarser level) 
is repeated for the next level, and so forth. At the end of this process we have the solution (potential) for 
all cells. The description of the code is given in the next section. 


3. Description of the code 
3.1. Code structure 

The structure of the code can be outlined as follows. First of all, we set up the initial positions and 
velocities of the particles using the Zeldovich approximation, as described by Klypin & Shandarin (1983). 
Once the initial conditions are set, we construct the regular cubic grid covering the whole computational 
volume and then proceed to check whether additional refinement levels are required according to the current 
density threshold. At this point the code enters the main computational loop, which includes: 

• density assignment on all existing meshes; 

• a gravitational solver; 

• routine updating of particle positions and velocities; 

• modifications to the mesh hierarchy. 

The mesh modifications (refinement and derefinement) are based on the density distribution^. The modifi¬ 
cations are made at the end of the computational cycle. At this point the density distribution is available, 
since it was calculated for the gravitational solver. 

Below we will describe each of these major functional blocks in detail. We will also discuss timing, 
energy conservation, and the memory requirements of the code. 


3.2. Mesh generator. 

The adaptive mesh refinement block of the code generates new meshes and modifies existing ones. 
The refinement hierarchy in our implementation is based on the regular cubic grid that covers the entire 
computational volume. With the refinement block turned off, the density assignment and gravity solver on 
this grid are similar to those in the PM code of Kates, Kotok, & Klypin (1991). 

The data structures that we use to organize the mesh cells are very similar to those implemented in the 
hydrodynamical Eulerian tree refinement code^ (Khokhlov 1997). All mesh cells are organized in refinement 
trees. A cell can be a parent of eight children - smaller cubic cells of equal volume residing in it. Each child 
may be in its turn split and have children. Each tree has a root (a zeroth-level cell) that may be the only 
cell in this tree if it is unsplit. The tree ends with unsplit cells, which we call leaves. This structure is called 
an octal rooted tree, and is the construct used in TREE codes. There is, however, an important difference 


®The density criterion is a natural choice in our case because we aim to resolve high-density regions. We could use, however, 
any other appropriate criterion, e.g., local potential gradient, force accuracy, etc. 

^There are, however, some important modifications required by specifics of the cosmological simulations. 
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between our code and TREE codes. We use fully threaded trees, in which cells are connected with each 
other on all levels. In addition, cells that belong to different trees are connected to each other across tree 
boundaries. In fact, we can consider all cells as belonging to a single threaded tree with a root being the 
entire computational domain and the base grid being one of the tree levels. The tree structure is supported 
through a set of pointers. Each cell has a pointer to its parent and a pointer to its first child. In addition, 
cells have pointers to the six adjacent cells (these make the tree fully threaded) so that information about a 
cell’s neighbors is easily accessible (see Eig.l). Overall, the following information is provided for each cell i 
belonging to a tree: 

• Level(i), the level of the cell in the tree; 

• Parent{i), the pointer to the parent cell; 

• Child{i), the pointer to the cell’s first child, or nil if the cell is a leaf; 

• Nb{i,j), pointers to neighboring cells (j = 1, ...,6); 

• Pos{i,j), position in space (j = 1, ...,3); 

• Var(i, n), storage for associated physical variables (in our case n = 2, as we store both the density and 
the potential). 

The above set of pointers is sufficient to support the tree structure and to change it dynamically with 
minimum cost. In addition, the cells on each level of the mesh hierarchy are organized in doubly linked lists^ 
(e.g., Knuth 1968) so that a sweep through a given level (the operation used extensively in the multigrid 
relaxations described below) can be done with minimum CPU time. This organization adds two pointers 
for each eight siblings, or 1/4 storage elements per cell. The cells belonging to the base regular grid (level 
zero), while part of the same data structure as the other cells, are created only at the very beginning of a 
simulation and are never destroyed. It is therefore unnecessary to keep information about a cell’s position or 
pointers to its neighbors because they can be easily computed. The number of pointers can be considerably 
reduced (by as much as a factor of 2) because some of them can be shared by siblings (sets of eight cells 
with the same parent). 

An elementary refinement process creates eight new cubic cells of equal volume {children) inside a parent 
cell. When the parent is refined, we check if all six neighbors are of the same level as the parent. If there 
are coarser neighbors (of smaller level than the parent), we split those neighbors. If a neighbor in its turn 
has coarser neighbors, we split the neighbor’s neighbors, and so forth. We thus build a refinement structure 
which obeys a rule allowing no neighbor cells with level difference greater than 1. Examples of allowed and 
prohibited configurations are shown in Eigure 2. Although this is the only rule in the whole refinement 
process, it determines the global structure of the resulting refinement hierarchy, assuring that there are no 
sharp resolution gradients on a level’s boundaries. On the next refinement pass, each of the newborn children 
is checked against the density criterion and can be subdivided into 8 children in its turn if further splitting is 
needed. The process stops when either the density criterion is satisfied everywhere or the maximum allowed 
refinement level is reached. 


®The difference between a doubly linked list and the usual linked list (used, for example, in the P®M codes) is that in the 
former we keep not only a pointer to the next element but also a pointer to the previous element in the list. This allows us to 
insert and delete list entries without rebuilding the whole list. 
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The refinement process proceeds level by level starting from the base grid. On any level of the mesh 
hierarchy the process can be split into two major parts. First, we mark up^ all the cells which need to be 
split, creating a refinement map. However, a map constructed in this way tends to be “noisy”. We smooth it 
by marking additional cells so that any cell which was originally marked is surrounded by a buffer of at least 
two other marked cells. We construct this buffer using an algorithm which includes several passes through a 
level, each one marking additional cells. During the first pass the neighbors of cells marked in the refinement 
map are marked for splitting also. After that, two passes are made in which we mark for splitting only 
those cells which have at least two neighbors already marked for refinement (note that when we speak of 
marked cells, we mean cells marked only during passes before the one we are discussing, not during the pass 
under consideration). These three passes create a one-cell cubic buffer around each of the cells marked in 
the original refinement map. Each additional set of three passes similar to those described above will build 
one more cubic layer around every originally marked cell. Therefore, to build a two-cell buffer we make six 
passes. When the map is completed, it is used to make the actual splitting. 

The refinement procedure described above can be used either to construct the mesh hierarchy from 
scratch or to modify the existing meshes. However, in the course of a simulation the structure is neither 
constructed nor destroyed. Instead, in every computational cycle we modify existing meshes to account for 
the changes in particle distribution. Therefore, we need to make not only refinements but also derefinements 
(in the places where it is no longer necessary to keep resolution at the current level), which is accomplished 
in the same manner as refinement by constructing a derefinement map - that is, a map of cells marked for 
joining. If the joining violates the above-mentioned neighbor rule, nothing is done and the cell remains split. 
Therefore, the code modifies the existing structure dynamically, keeping the refinements in accord with the 
ever-changing density field. Modifying the hierarchy requires much less CPU time than rebuilding it because 
only a small number of cells needs to be modified at any given time step. Figure 3 shows an example of the 
refinement mesh hierarchy built in one of the ACDM cosmological simulations described in §4.4. A selected 
region of Figure 3 is shown expanded in Figure 4. 


3.3. Particles within the mesh hierarchy and density assignment 

Particle coordinates are not sufficient to specify the particle-mesh connection because cells of different 
levels can share the same volumes; we need to know, however, which particles belong to a given cell. We 
keep track of the particles by arranging them in doubly linked lists so that every cell “knows” its head 
linked-list particle (the head is nil if the cell is empty) and thus all the other particles in this linked list. If a 
particle moves from cell to cell, it is deleted from the linked list of the cell it leaves and is added to the new 
cell’s linked list. Only leaves are allowed to own particles. Once a cell is split, all its particles are divided 
among its children. However, we solve the Poisson equation on every refinement level, so that the value of 
the density must be computed for every cell regardless of whether or not it is a leaf. On each level, starting 
from the finest level and up to the zeroth level, the density is assigned using the standard cloud-in-cell (CIO) 
technique (Hockney & Eastwood 1981). Because particles belong only to the finest cells enclosing them, 
when we change between levels the particles are passed from children to their parents. The particles are 
transferred in this way only as far as the density assignment is concerned; the linked list is not changed. The 
particles, therefore, contribute to the density on any level in which they are physically located. 


In this step, a cell is marked for splitting if the local density exceeds a predefined level-dependent threshold. 



3.4. Poisson solver 


The fact that the zeroth level of the mesh hierarchy is a cubic regular grid of fixed resolution allows us 
to use the FFT method to solve the Poisson equation on this grid (Hockney & Eastwood 1981). The FFT 
technique naturally supports periodic boundary conditions which is important for cosmological simulations. 
Moreover, the FFT is well benchmarked and is about twice as fast as the relaxation method described below. 

The Poisson equation on the refinement meshes is defined as a Dirichlet boundary problem for which 
boundary values are obtained by interpolating the potential from the parent grid. In our algorithm, the 
boundaries of the refinement meshes can have an arbitrary shape, which narrows the range of PDE solvers 
one can use. To solve the Poisson equation on these meshes, we have chosen the relaxation method (Hockney 
& Eastwood 1981; Press et al. 1992), which is relatively fast and efiicient in dealing with complicated 
boundaries. In this method the Poisson equation 




( 1 ) 


is rewritten in the form of a diffusion equation. 
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The point of the method is that an initial solution guess (j) relaxes to an equilibrium solution (i.e., solution 
of the Poisson equation) as t ^ oo. The finite-difference form of equation (2) is: 
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where the summation is performed over a cell’s neighbors. Here, A is the actual spatial resolution of the 
solution (potential), while At is a fictitious time step (not related to the actual time integration of the 
A-body system). This finite difference method is stable when At < A^/6 (Press et al. 1992). If we choose 
the maximum allowed time step At = A^/6, the above equation can be rewritten in the form of the following 
iteration formula: 
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The relaxation iteration thus averages the potential of a cell’s six neighbors and subtracts the contribution 
from the source term. Cells in the boundary layer will have some neighbors belonging to the coarser level. In 
this case, we need to interpolate to get the potential at the location of the expected neighbor. It is desirable 
that the interpolation maintain continuity and isotropy of the force (see discussion in Jessop et al. 1994). 
We have found that linear interpolation perpendicular to the boundary which incorporates both coarser and 
finer cell potentials is satisfactory; we get the interpolated value of the potential on the boundary of level I 
as: 

4>int = + (I - W^)4>l-i. (5) 


Here Wi is a weight, and </>/ and are the potentials of a boundary cell of level I and of its {I — l)-level 
neighbor. We found the optimal value of Wi to be 0.2 by minimizing the force discontinuity for particles 
moving through mesh boundaries. The iterative procedure described above is repeated until the desired level 
of convergence is achieved. We can speed up the convergence of the relaxation procedure considerably by 
using an initial guess for the solution that is already close to the final solution. Such an initial guess can be 
obtained by interpolating the potential from the previous coarser mesh, for which the Poisson equation was 
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already solved. By doing so, we need only 2 — 3 iterations to find the potential to an accuracy of a one or 
two percent. Nevertheless, a higher accuracy is needed because the potential is then differentiated to get the 
accelerations; the errors in accelerations are thus larger than the errors in the potential. Therefore, we would 
need to make more iterations to reach the same ~ 1 — 2% accuracy level in the acceleration. The number 
of required iterations, however, can be considerably reduced by using the so-called successive overrelaxation 
(SOR) technique (Hockney & Eastwood 1981; Press et al. 1992). In this technique, the solution in a given 
cell is computed as a weighted average, 

+ (1 - (6) 

where is the solution obtained via the iteration equation (4), (j/^ is the solution from previous iteration 
step, and ui is the overrelaxation parameter. The parameter to can be adjusted to minimize the number of 
iterations required to achieve a certain accuracy level. 

The ultimate goal of any A^-body algorithm is to get an accurate approximation to the pairwise inter¬ 
particle forces. Therefore, we use the force accuracy (see §4) to determine the required number of iterations. 
Of course, there is no point in using more iterations than the number needed to make the iteration error 
smaller than truncation error. The latter can be estimated by making the number of iterations very large, 
so that the iteration error is negligible. We then find the minimum number of iterations for which the force 
accuracy is still at the level of truncation errors. The number of iterations is further minimized by adjusting 
the overrelaxation parameter. We have found empirically that only 10 relaxation iterations are needed if 
uj = 1.25. 


3.5. Particle dynamics 


To integrate the trajectories of the dark matter particles we use the Newtonian equations of motion 
in an expanding cosmological framework (e.g., Peebles 1980). These equations can be expressed in terms 
of comoving coordinates x related to the proper coordinates as r = a(t)x, where a{t) = (1 -I- z)~^ is the 
expansion factor: 


^ = -v. 

dt 


^ 2 > 


(7) 


dx 
dt 

where p is the momentum of a particle and is given by the Poisson equation relating the potential (j) to 
deviations of density from the background: 


^l(j> = ‘iTTGa?{p- p). 
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The above equations are integrated numerically using dimensionless variables 

X = a;ox, t = t/Ho, (j) = (j){xoHof, p = p(xoi?o), P = (9) 

87rG a'^ 

where xq is the length of a zeroth-level mesh cell and Hq is the Hubble constant. We also use the expansion 
factor a instead of the time t, so that equations (7)-(8) can be rewritten as: 
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Here is the present-day (z = 0) contribution of matter to the total density of the universe and is the 
corresponding contribution of the vacuum energy (measured by the cosmological constant). The function / 
is specific to a given cosmological model. The general form of this function, valid for open, flat, and closed 
cosmologies, is (e.g., Carrol et al. 1992): 


/(Hm, a) = 


H (l/o — 1) T — 1) 


We adopt a standard second-order leapfrog integration scheme of advancing particles to the next time 
step. For a step n, corresponding to time step a„ = ai„it + nAa, the momenta and positions of particles are 
updated as follows: 

p„+i = p„_i -/(flM,f^A,a„) V0„ Aa, 

( 12 ) 

v„+i 

x„+i = x„-h/(flM,f^A,a„+i) Aa. 

^ ^ , 1 
n+^ 

Here the indices n, n -I- 1, and n ± ^ refer to quantities evaluated at a„, a„+i, and a„ ± Aa/2 respectively. 
Although multiple time stepping is probably very efhcient in terms of CPU time, in the current version of the 
code we use a constant time step for all particles. We plan to implement individual time steps for different 
levels in the future. Particle coordinates and velocities are updated using their accelerations obtained via 
numerical differentiation of the potential and interpolation to the particle location using the CIC method 
(Hockney & Eastwood 1981). There are, however, some complications because particles can move through 
the level boundaries. The resolution gradients, for example, may induce unwanted force fluctuations and 
anisotropies (Jessop et al. 1994; Anninos, Norman, & Clarke 1994). In addition, momentum conservation, 
achieved by exact cancellation of numerical terms in the CIC method, is no longer guaranteed. This means 
that additional care must be taken to minimize these effects. Usually, this is done by introducing extra buffer 
regions along the mesh interfaces so that force interpolation on the boundaries is avoided. In our code, we 
do not introduce additional buffer cells on the mesh boundaries because the meshes are already expanded 
by smoothing (see §3.2). Therefore, we simply prohibit force interpolation that uses both coarse and fine 
boundary cells, interpolating instead on the coarse level. In this way, particles are driven by the coarse force 
until they move sufficiently far into the finer mesh. The same is true for particles moving from the finer to 
coarser mesh. 


3.6. Memory requirements 

The memory requirements of the code are determined by the number of dark matter particles Np, the 
number of cells in the zeroth-level base grid Nf!, and the number of cells on refinement levels Nf. In the 
current implementation of the code the total number of memory storage elements N used by the code is: 

~ 8Np + 6N° + 15N^. (13) 

The particle information consists of coordinates, momenta, and two pointers used to organize the dou¬ 
bly linked list. The overhead for Nf is determined by the pointers used to support the tree refinement 
hierarchy (see §3.2). It can be significantly reduced (this scheme was implemented in Khokhlov 1997) if 
most of the cell information (namely Level, Parent, Nb, and Pos; see §3.2) is shared between siblings 
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(the eight cells which have the same parent). The cells have individual pointers to their first child and 
to two physical variables (density and potential). In this case, for eight refinement cells we need only 
lLevel+lParent+6Nb+3Pos+8Child +16Var= 35, storage elements (« 4.5 per cell instead of 15). We plan 
to implement these improvements in the future versions of the code. 

j^ART compared to the corresponding number of storage elements in a PM code: 

« 6iVp + iV°; (14) 

The apparent overhead of the ART code compared to the PM code is the price for a fully adaptive and 
flexible mesh structure. It should be noted, however, that to increase resolution by a factor of 2 in a PM 
code the number of cells must be increased by a factor of 8, which severely limits the maximum possible 
dynamic range (~ 1500, with largest currently available computers). In the ART code, the resolution is 
improved by increasing which changes very slowly when resolution is increased. For example, to increase 
resolution in the highest density regions by a factor of 2 in the simulations described in Section 5 (see also 
Table 1), the total number of cells was increased only by ~ 3%. Note also that the dynamic range of ~ 4000 
was achieved with only ~ 5 x 10® cells while a PM code would require ~ 6.4 x 10^® cells to reach the same 
resolution. For comparison, the memory requirements of the publicly available version of TREE code (kindly 
provided by J.Barnes) is « llA^p + ISNceiis, where Nceiis is the number of tree cells. The memory 

requirements of AP®M code (Couchman 1991) are « lOA^p. 


3.7. Timing 

In this section we present timings of the current version of the code and compare the performance 
with other high-resolution A^-body codes. The present version of the code was parallelized to run in shared 
memory mode on the HP-Convex SPP-1200 Exemplar - a multipurpose scalable parallel computer. Cur¬ 
rently, the code is not fully parallelized. The blocks of code that require considerable parallelization efforts, 
namely the density assignment and cell splitting/joining during the mesh modifications, are run serially. The 
parallelization of the most CPU-expensive parts of the code, the Poisson solver and the force interpolation, 
was straightforward. We are now working on the complete parallelization (including distributed memory 
architectures) and optimization of the code and will present details elsewhere. 

In Figure 5a we show the performance of different blocks of the code with respect to the expansion 
parameter in a ACDM simulation (Ha = 1 — Hq = 0.7, h = 0.7, as = 1.0) of an L = 15h~^ Mpc box with 
N = 32® particles and base grid of 64® cells (more details are given in §4.4). The simulation was run on 
8 CPUs of the NCSA SP-1200 Exemplar in shared memory mode. The CPU overhead for running code in 
parallel is about 50% and the same simulation run on an IBM RS/6000 workstation performs ~ 2.5 times 
faster (in terms of CPU but not in terms of wall-clock time!). The overhead is mostly due to the unusually 
large penalty for cache-missing events that results if memory is accessed randomly. In Table 1 we present 
timing for different code blocks for the final time step {z = 0) of two similar ACDM simulations with 64® 
particles (see section 5.2) and with different resolutions. The base grid in both simulations was 128® and 
numbers of maximum allowed refinement levels were 4 and 6 (the number of mesh cells in Table 1 includes 
zeroth-level cells). As before, the simulations were run on 8 CPUs of the SPP-1200 Exemplar. We compare 
the timings from Table 1 with the performance of the AP®M code (Couchman 1991). The final step in a 
two-level AP®M simulation of an open (Hq = 0.5, Ha = 0, h = 0.63, as = 1.2) cosmology (box L = 150/i“® 
Mpc, 128® grid, 128® particles, and smoothing kernel rj = 0.1 cell giving a dynamic range of about 1000) took 
1316 CPU seconds on one processor of IBM SP-2 computer (S.Borgani 1996, private communication). This 
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number is roughly consistent with the timings presented in original paper of Couchman (1991) if we account 
for the difference in Mflops between the machines used. The first simulation in Table 1 is comparable in 
spatial resolution to the above AP^M simulation but we must account for the different number of particles. 
Only density assignment and particle motion directly scale with number of particles. We, therefore, multiply 
the CPU time spent by these routines by 8 which gives a total time for the final step ~ 650 CPU seconds. 
Note, however, that about half of this CPU time is penalty for the cache missings which would be negligible 
for a serial run on the SP-2. The ART code, therefore, is about 3 times faster than AP^M code of comparable 
resolution. Note also that although in the second simulation from Table 1 the resolution was increased by a 
factor of 4, the CPU time did not change significantly because only a relatively small number of additional 
cells was required to resolve the highest density regions. 


3.8. Energy conservation 


In an expanding universe, energy conservation is expressed by the Irvine-Layzer-Dmitriev-Zeldovich 
equation. 


or 


where 


d|„(T + C,l = -T§, 

(15) 

<‘(r + ur„ = - frda. 

(16) 
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The error in energy conservation at a time is then measured by comparing the change in total energy with 
the change in all: 

a{T + U)\7r+Q:lTda 


Error = 


aUQ- 


(18) 


In Figure 5b we show the energy conservation error versus the expansion parameter a for two ACDM 
simulations with 32^ and 64^ particles. In both simulations the time step, Aa, was chosen so that none of 
the particles would move more than a fraction (~ 0.1 — 0.3) of the mesh cell they were residing in over one 
step. We note that energy is conserved at the level of ~ 2% in the 32^ particle simulation and at the level 
of ~ 1% in the 64^ particle simulation. The maximum errors of ~ 5% and ~ 3% for the two simulations 
occurred when the first two refinement levels were opened at a ~ 0.18. This may be a result of the rather 
fast change in resolution in the regions of ongoing nonlinear collapse. 


3.9. Density threshold 

One of the important parameters of the code is the density threshold, which is used to decide whether a 
given patch of the computational volume should be refined or derefined. The usual practice in the adaptive 
refinement algorithms is to consider this threshold a free parameter (see, e.g., Suisalu & Saar 1995). While 
we also take it as a free parameter, we will discuss here some practical limits. First of all, we do not want to 
set up the density threshold too low because it usually leads to shot noise in the refinement procedure and 
can also lead to unwanted two-body effects (we should not force a resolution considerably less than mean 
separation between particles). Our tests (particularly the spherical infall test presented in §4) have shown 
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that two-body effects are negligible and that the refinement/derefinement procedure is stable if we use a 
density threshold corresponding to > 5 — 6 particles in a mesh cell. We choose the minimum value of 5 for 
the simulations described in §5. We can also consider this parameter from a different point of view. The 
threshold can be thought of as a parameter defining the overdensity at which the refinement takes place. In 
the case of simulations presented in §5 the threshold value corresponds to an overdensity of 40 on the first 
refinement level. This overdensity is reasonable for the purpose of resolving a halo - the first refinement 
happens well beyond the halo virial radius (corresponding to the overdensity 200). We have also found that 
results are not particularly sensitive to the threshold in the range of 5 — 10 particles per cell. 


4. Tests of the code 

In this section we present tests of the developed code. In particular, we show results describing the 
accuracy of the force calculation in ART scheme and related issues of resolution. We discuss the results of 
the Zeldovich pancake test and of the spherical infall test. Finally, we compare results for a set of realistic 
cosmological runs obtained with the ART and PM codes. 


4.1. Force accuracy 

It is important to know the shape and accuracy of both short- and long-range forces in order to compare 
the resolution of different codes. Here we present a test showing the accuracy of force calculations in PM 
and ART schemes and compare them to the Plummer softened force often used in both P^M and TREE 
codes. 

We used a 64^ base grid with a massive particle in the center and a second particle placed randomly 
nearby. The refinement meshes were constructed up to the specified level, so that both particles were located 
on this level. The usual potential and force calculations (described above) were then performed to get the 
pairwise force between these two particles which was compared with the “exact” Newton force.[] Figure 6 
shows the results of the force calculation with the FFT method (i.e., ART without refinements) and with the 
relaxation method at different levels of refinement. We plot relative acceleration errors calculated as follows: 

7 -, ^caZc \^theor\ 

Error = -^^-, 

l^theor \ 

where lacajd is the acceleration calculated on the mesh and \sitfieor\ is the theoretical acceleration. The upper 
panel in Figure 6 shows the relative force error versus interparticle separation, given in the units of the base 
grid, for a pure PM calculation and for the second and fourth refinement levels. Note that despite the general 
similarity of the shape of FFT and relaxation forces, the scatter of the latter at small separations is larger 
than that of the formeij^. The scatter at small separations does not, however, mean that we have the same 
errors in particle orbits. We have performed a test placing two particles at the distance of 2 grid cells (where 
the scatter is largest) and giving them velocities in such a way that in case of a zero force error they would 


^We should note that particles in the CIC scheme (see §2) have a cubic rather than a spherical shape so that the exact force 
at small separations is not an inverse square function of distance. 

® However, the magnitude of scatter is typical for PM codes. See, for example, Fig.l in Villumsen (1989) or Fig.2b in Gelb 
(1992) 
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stay on a circular orbit. The deviation of the diameter of the orbit from its true value is a measure of the 
code accuracy. We performed the test for the PM force (two particles on the zeroth level at the distance 
of two base grid cells from each other) and relaxation force (two particles on the second refinement level at 
the distance of 2 second-level cells or 0.5 zeroth-level cells). The resulting particle trajectories are shown in 
Figure 7. After more than two orbital periods the particles stayed on the nearly circular orbit with maximum 
error in diameter of about 8%. Note that we did not observe either significant drift of the orbit center or 
any catastrophic break of the trajectory. We conclude that even for separation equal to the resolution the 
code produces reasonably accurate trajectories of particles. 

The lower panel in Figure 6 shows the relative error for ART force calculated on the fourth level versus 
distance in units of the fourth-level mesh cell, together with the relative error corresponding to the Plummer 
softened force {solid line) with the softening parameter e[<j) oc l/\/r^ -I- e^] equal to the size of the fourth-level 
cell. Comparison shows that while the ART force error fluctuates around zero for distances greater than two 
mesh cells, the Plummer softened force is considerably weaker than 1/r^ law for up to six grid cells. The 
relation between resolution of the ART code (2 mesh cells of the maximum refinement level), Hart, and 
Plummer softening length is thus e « ShART- Gelb (1992) studied the shape of the Plummer softened force 
in his P^M code. The result is consistent with ours (Gelb 1992, Fig. 2.4) - the force starts to fall down at a 
distance about five times larger than the softening length. 


4.2. Zeldovich pancake collapse 


One-dimensional plane wave collapse in an expanding universe is one of the traditional tests of iV-body 
codes (Klypin & Shandarin 1983; Efstathiou et al. 1985). In this test, the analytic solution (Zeldovich 1970) 
is used to check how accurately the code integrates particle trajectories. If we know the initial conditions, 
the solution predicts particle positions for any other moment of time: 


xf = q* -k 



cos (27rk • q), 


here q^ are the initial (unperturbed) positions, k is the wavevector, and A{t) is the amplitude. In an 12 = 1 
universe, A = a{t)/ao, where oq is a scale factor at the crossing time. The corresponding velocities can be 
obtained by differentiating the above formula for positions. 


We used a base grid of 32^ cells with 64^ particles. The particle positions and velocities were initially 
perturbed using the Zeldovich approximation. The particle trajectories were integrated by the ART code 
with three levels of refinement (the density threshold for opening a new level was twenty particles in a cell 
on the base grid and three particles in a cell for any refinement level). In Figure 8 we show one-dimensional 
phase diagrams at the crossing time that compare results of the ART code with results of the PM code 
(grid of 32^ cells). The figure shows that the ART code follows the analytical solution more accurately 
than the PM code. This result is shown quantitatively in Figure 9, where we plot rms deviations of the 
particle positions and velocities from the exact solution as a function of time (Efstathiou et al. 1985): 
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4.3. Spherical infall test 


An analytic solution describing spherical infall of material onto an overdensity in an expanding cos¬ 
mological framework was developed by Fillmore & Goldreich (1984) and Bertschinger (1985). As noted by 
Splinter (1996), the problem possesses a symmetry different from the intrinsic planar symmetry of the mesh 
codes, which makes it a useful and strong test. The analytic solution describes the evolution of a spherical 
uniform overdensity 5p/p = 5i 1 in a, region which has proper radius Ri at some initial time ti in a flat 
Einstein-de Sitter universe. As the density contrast grows, the matter initially inside Ri is increasingly decel¬ 
erated. Eventually it stops expanding and turns around to collapse. If the initial Hubble flow is unperturbed 
(all peculiar velocities are initially equal to zero), the initial turnaround occurs at the time tua'- 


tita = 


At this moment the initial overdensity reaches its maximum radius: 


‘^ita — 


Later on, shells of successively larger and larger radii turn around. At a given time t 3> tua, a shell of a 
radius 

/ t 

starts to collapse (Bertschinger 1985). 

The solution for the density profile of the overdensity is self-similar in terms of the dimensionless variable 
A = r/rta{t) and for A <C 1 can be expressed as (Fillmore & Goldreich 1984; Bertschinger 1985): 


D{X) « 2.79A-®/^ 


while at larger A the main feature of the solution is the presence of sharp caustics. 

We modeled the spherical infall problem described above by distributing 32^ particles uniformly on the 
32^-cell grid (in the centers of the grid cells) and placing additional particles in a sphere of radius 2 grid cell 
lengths in the center of the computational volume. The number of additional particles determines the initial 
overdensity, which we have chosen to be 6i = 0.2. We integrated particle trajectories from Ui = t/ = 0.1 up 
to a = 10.1, taking 10,000 steps to ensure that particles move only a fraction of a mesh cell in a single time 
step on any of the five refinement levels, which is the condition required for the integration to be stable. The 
refinement levels were introduced in the regions where density was equivalent to more than six particles in 
a cell and the number of levels was limited to five, thus making the effective resolution 1024 in the highest 
density regions. In Figure 10 the calculated density profile is compared with the analytic solution (from 
Tables 4 and 5 of Bertschinger 1985). We see a good agreement between the calculated density profile and 
the analytic solution at all radii down to the resolution limit. 


4.4. Realistic cosmological runs: comparison with PM code 

To test the performance of the code in realistic cosmological simulations, we made a set of runs using 
ART and PM codes with the same initial conditions and different spatial resolutions and compared the 
resulting particle distributions. In the first four runs we simulated an L = 20h~^ Mpc box with N = 32^ 
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particles, assuming a flat ACDM cosmological model {Ha = 0.7, h = 0.7, as = 1.0). In the ART runs we 
allowed for two levels of refinement starting from 64^ and 128^ grids - we will call these runs ART 64^ + 2L 
and ART 128^ + 2L. The number of time steps was ~ 1000 in the ART 64^ + 2L and ~ 2000 in the ART 
128^ + 2L run. The refinement levels were introduced wherever the density exceeded a threshold value 
equivalent to more than 5 particles per cell. The PM code was run with 64^-cell (PM 64^) and 256^-cell 
(PM 256^) grids. In Figure 11 we show the projected particle distribution at z = 0 from these four runs, 
with a subsets of particles belonging to one of the halos shown in a smaller window. The global distribution 
of particles and halos is well reproduced by the ART code. Also, halos in the ART 64^ + 2L simulation 
are much more compact than in the PM 64^ simulation. This result is shown quantitatively in Figure 12, 
where we compare density distribution functions (the fraction of the total mass in the regions of a given 
overdensity) in these simulations. The density distributions for all runs were computed after rebinning the 
density field to the 256^ grid. The resolution of a simulation puts limits on the maximum density in the halo 
cores because gravitational collapse virtually stops at scales of ~ 1 grid cell (e.g., Klypin 1996). Therefore, 
the density in the halo cores (the high-density tail of the distribution) is a good indicator of the spatial 
resolution. We note that the density distribution functions for both the ART 64^-|-2L and the PM 256^ runs 
show approximately the same behavior, reaching overdensities of « 2 x 10"^, while the PM 64^ run fails to 
produce halos with overdensities greater than « 5 x 10^. We therefore conclude that the ART code produces 
density fields similar to those of a PM code of comparable resolution. 

The first application of the code was the study of the structure of dark matter halos. Therefore, as a 
final test we compared the halo density profiles in the ART and PM simulations. The size of the simulation 
box, L = 15h~^ Mpc, was chosen to be the same as in the larger simulations described in the next section. 
The rest of the parameters were the same as in the above simulations. We simulated the evolution of the 32^ 
particles using the PM code with 256^-cell mesh and the ART code with a 64^-cell base grid and three levels 
of refinement. As before, we refined regions where the local density exceeded a threshold value of about 
5 particles per cell. A halo-finding algorithm (described in the next section) was applied to the resulting 
particle distribution. In Figure 13 we present the density profiles for six halos of different masses. The mass 
resolution in these simulations (1.2 x IO^^Mq) determined the mass range of halos. The most massive halo 
in Figure 13 consists of about 5000 particles, while the least massive contains only about 100 particles. The 
density profiles of PM and ART halos agree reasonably well down to the resolution limit (~ 60h~^ kpc). 


5. An application: structure of dark matter halos in CDM and ACDM models 

5.1. Motivation 

We used the code to study the structure of dark matter halos in two of the currently popular cosmological 
models: standard cold dark matter (SCDM) and cold dark matter with cosmological constant (ACDM) 
models. The dark matter halos play a crucial role in the formation and dynamics of galaxies and galaxy 
clusters. Therefore, theoretical predictions about the structural and dynamic properties of the halos can be 
compared with observations and used as a powerful test of a given theoretical model. The numerical study of 
the halo structure requires very high spatial dynamic range (at least ~ 10^) because the simulation box has 
to be large enough to account correctly for large perturbation waves and the force resolution has to be high 
enough to make predictions in the observational range (< 5 kpc). The ART code was designed to handle 
such high dynamic ranges. 

The properties of dark matter halos were intensively investigated recently for a variety of cosmological 
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models. Early numerical studies (Frenk et al. 1985, 1988; Quinn, Salmon, & Zurek 1986; Efstathiou et 
al. 1988) indicated that the density profiles of dark matter halos in hierarchical clustering models in a flat, 
n = 1, universe were approximately isothermal [p(r) oc r~^], in agreement with analytic results (Fillmore & 
Goldreich 1984; Bertschinger 1985). The dependence of the halo density profiles on the initial perturbation 
spectrum and on specific parameters of the cosmological model were also studied both analytically (Hoffman 
& Shaham 1985; Hoffman 1988) and numerically (e.g.. Crone, Evrard, & Richstone 1994). These early 
numerical studies, however, lacked the necessary mass and spatial resolution to make reliable predictions on 
the structure of the halo cores. To overcome the resolution limits, substantial efforts were made to simulate 
the formation of halos from isolated density perturbations (e.g., Dubinski & Carlberg 1991; Katz 1991) 
or to resimulate with a higher resolution halos identified in large low-resolution runs (Navarro, Frenk, & 
White 1996a, hereafter NFW; Tormen, Bouchet, & White 1997). These simulations have the advantage of 
simulating halos in a wide mass range with homogeneous spatial and mass resolutions. However, it is hard 
to infer the statistically reliable results from these simulations because only a few halos are simulated. In 
a direct simulation one can get a statistically significant sample of halos suited for more detailed analysis. 
Studies of the structure and dynamics of halos extracted from high-resolution direct simulations were done 
by Warren et al. (1992) and Cole & Lacey (1996). Warren et al. (1992) used a TREE code to simulate the 
evolution of 128^ particles in an H = 1 universe with scale-free initial conditions. The force resolution was 
determined by imposing a Plummer softening of e = 5 kpc. Special emphasis was given to the investigation 
of halo shapes. Similar initial conditions were used in the study by Cole & Lacey (1996), who used a P^M 
code to evolve 128^ particles. The resolution in the latter simulations was L/e = 3840, where L is the size of 
the computational volume and e is the Plummer softening parameter. The results indicated that the density 
profiles of all simulated halos are well fit by the analytical model of NFW. This model has p oc at small 
radii and steepens smoothly to p oc r~^ at a scale radius Ts'- 


plj) oc 


1 
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(19) 


The density profile described by this expression is singular, because the density rises arbitrarily high when r 
0, forming a cusp. The cuspy structure of the central parts of a halo thus represents a generic prediction of 
the model. Although the NFW profile is consistent with current X-ray and gravitational leasing observations 
of galaxy clusters (NFW), the p oc r~^ behavior is in contradiction with observations of dynamics of dwarf 
spiral galaxies that imply flat central density profiles (Flores & Primack 1994; Moore 1994; Burkert 1996). 
These observations can serve as one of the critical tests of any model that includes a dark matter component 
because it is generally believed that the dynamics of the dwarf spiral galaxies is dominated by dark matter 
on scales r > 1 kpc. The fact that the NFW profile holds for a variety of cosmological models (NFW; Cole 
& Lacey 1996) indicates its possible universality for CDM-like models. The goal of the present study was 
to investigate the structure of dark matter halos formed in a ACDM model. This model is currently one 
of the most successful scenarios of structure formation in the Universe. It is, therefore, important to check 
whether the central cusp is present in halos formed in this model. 


5.2. Simulations 

To study the structure of dark matter halos, we simulated the evolution of 64^ particles in standard 
CDM (H = 1, ft, = 0.5, (Ts = 0.63) and ACDM (O = 0.3, Ha = 0.7, ft = 0.7, = 1.0) models. We made three 

runs: one high-resolution (resolution ~ 2h~^ kpc) run for each models, and a lower resolution run (resolution 
~ 8ft“^ kpc) for the ACDM model to study the effects of resolution. In terms of the Plummer softening 



- 18 - 


length (see §4.1), our resolution corresponds to L/e « 12000 for the high resolution runs, and L/e « 3000 for 
the low-resolution run. The simulations were started at z = 30 and the particles trajectories were integrated 
by taking 3872 time steps in the low-resolution run, and 7743 time steps in the high-resolution runs. The size 
of the simulation box, L = 15/i“^ Mpc, determined the mass resolution (particle mass) as 3.55 x 10 ®/i“^Mq 
for SCDM and 1.06 x for ACDM. 


5.3. Halo-finding algorithm 

To identify halos in our simulations, we use an algorithm similar to that described in Klypin, Primack, & 
Holtzman (1996). The algorithm identifies halos as local maxima of mass inside a given radius. The efficiency 
of the algorithm was improved by incorporating the idea of Warren et al. (1992) of finding approximate 
locations of density peaks using particle accelerations. This idea is based on the principle that particles with 
the largest accelerations should reside near the halo centers, which is true for halos with roughly isothermal 
density profiles. In practice, this way of finding density maxima has proved to be quite efficient. The halo 
identification algorithm can be described as follows. 

1. The particles are sorted according to the magnitude of their scalar accelerations. 

2. The particle with the largest acceleration determines the approximate location of the first halo center. 
Particles located inside a sphere of radius Vinu centered at the halo center are assigned to the same halo and 
are excluded from the list of particles used to identify halos. The radius Tina is an adjustable parameter; 
we use a radius approximately twice as large as the force resolution of a simulation. The procedure repeats 
for the particle with the largest acceleration in the list of remaining particles. The peaks are identified until 
there are no particles in the list. 

3. When all the density peaks are identified, we proceed to find more accurate positions of the halo 
centers. This is done iteratively by finding the center of mass of all particles inside Vinit and displacing the 
center of the sphere to the center of mass. The procedure is iterated until convergence. 

4. When the halo centers are found, we increase rinit until the overdensity inside the corresponding 
sphere reaches a certain limit. The limit is based on the top-hat model of gravitational collapse, which 
predicts the typical overdensity for virialized objects ~ 200 in CDM and ~ 334 for our ACDM model (e.g. 
Lahav et al. 1991; Kitayama & Suto 1996). However, we denote halo radius and the mass inside this radius 
defined as M 200 and r 2 oo regardless of the actual value of the limit. Smaller halos located within a radius 
r 2 oo of a bigger halo are deleted from the list. 

As an output we get a list of halo positions, velocities, and parameters (such as r 2 oo and M 2 oo)- 


5.4. Results: Halo Density Profiles 

We applied the halo-finding algorithm described above to identify halos in the simulations at zero 
redshift. Only halos with more than 100 particles within r 2 oo were taken from the full list. We also 
present results for relatively isolated halos, excluding all halos which have close (r < 2 r 2 oo) neighbors of 
mass more than half of the halo mass. The density profiles were constructed by estimating the density in 
concentric spherical shells of logarithmically increasing thickness, with the smallest radius corresponding to 
the maximum resolution. The resulting profiles were fitted with the analytical formula of NFW (eq. [19]), 
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taking the the scale radius as the fit parameter. In Figures 14 and 15 we show density profiles of nine halos 
of different mass identified in the high-resolution CDM and ACDM simulations, along with the analytic fits 
to the halos. The NFW profile appears to be a good approximation for halos of all masses (within the mass 
range of our simulations) in both CDM and ACDM models. 

NFW argued that the concentration parameter, c = r 2 oo/^s (see eq.[19]), of a dark matter halo depends 
on the halo mass. They found that low-mass halos are more centrally concentrated than high-mass ones, 
which possibly reflects a trend in the formation redshifts of halos. Figure 16 shows the concentration c as a 
function of halo mass (M 200 ) for halos identified in our simulations. The solid curve represents a theoretical 
prediction (see NFW for discussion) assuming the definition for the formation time of a halo as the first time 
when half of its final mass M 200 lies in progenitors with individual masses exceeding a fraction / = 0.01 of 
M 200 • This particular value of / seemed to provide the best approximation to the numerical results of NFW 
for the CDM model. The results of both the CDM and the ACDM simulations agree reasonably well with 
this curve. The larger spread of parameter c for low-mass halos arises mainly from statistical noise. The 
lowest mass halos (M 200 ~ 10^^'® M!©) contain a few hundred particles within their r 2 oo and thus have more 
noisy density profiles (typical 2a error in logc ~ 0.2 —0.3) than more massive halos (M 200 > 10^^ M©) which 
have tens of thousands of particles (error in logc ~ 0.05). 


5.5. Effects of resolution 

It is important to model reliably the structure of the very central part (r < 10h~^ kpc) of a halo because 
that is the part for which we can compare model predictions with observational results. Unfortunately, that 
part is also where force resolution may strongly affect the shape of the density profiles. To study possible 
effects of force resolution, we have compared density profiles of the same halos taken from ACDM simulations 
of different resolution described in §5.2. In Figure 17 we compare density profiles of four halos from these 
simulations. As before, the density profile is drawn only to the resolution limit of the simulation. We 
conclude that, up to the resolution limit, the lower resolution density profile follows the higher resolution 
density profile. 


5.6. Conclusions 

We studied the structure of dark matter halos in CDM and ACDM models with a resolution of ~ 2h~^ 
kpc in a box of 15h~^ Mpc. 

1. We found that for r < r 2 oo, the density profiles of all halos in both CDM and ACDM simulations are 
well fitted by the analytical formula (eq. [19]) of NFW. 

2. The mass dependence of the halo concentration parameter c in our simulations is consistent with the 
results of NFW. 

3. The fact that our results for the CDM model agree with the results of NFW serves both as a final 
test of the presented code and as an independent check of their method with results from direct cosmological 
simulations. 
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6. Discussion and conclusions 

We present a new high-resolution 7V-body code that incorporates the idea of an adaptive refinement 
tree (Khokhlov 1997) to build a hierarchy of refinement meshes in regions where higher resolution is desired. 
Unlike other Wbody codes that make use of refinement meshes, our code is able to construct meshes of 
arbitrary shape covering both elongated structures (such as filaments and walls) and roughly spherical dark 
matter halos equally well. The meshes are modified to adjust to the evolving particle distribution instead of 
being rebuilt at every time step. We use a cubic grid as the zeroth level of the mesh hierarchy. The size of 
this grid determines the minimum possible resolution of a simulation (i.e., resolution in regions where there 
are no refinements). The code blocks working on the zeroth-level grid are similar to those of a PM code. 
To solve the Poisson equation on refinement meshes, we have developed a new solver that uses a multilevel 
relaxation method with successive overrelaxation (Hockney & Eastwood 1981; Press et al. 1992). The solver 
is fully parallel and an FFT solver is only twice as fast as our solver for the same number of mesh cells. In 
real simulations with the same resolution, the relaxation solver outperforms the FFT because the resolution 
is achieved with a much smaller number of cells (see §3.7). The tests presented (§4) show that our code 
adequately computes gravitational forces down to scales of ~ 1.5 — 2 mesh cells. The memory overhead in the 
current version of the code is rather large compared to that of other high-resolution codes. The number of 
required mesh cells, however, changes very slowly with increasing resolution. At present, the code is capable 
of handling a dynamic range of ~ 10,000 and higher. In our latest runs, for which we have used modified 
version of the code incorporating multiple time steps, we reached a dynamic range of ~ 24, 000 for a system 
of 128^ particles. Tests of the code performance show that it is about three times faster than an AP^M code 
(and, therefore, TREE code; see Couchman 1991) of comparable resolution. Still, the condition requiring 
that particles move only a certain fraction of a mesh cell at every time step makes the code CPU rather than 
memory limited. 

The present version of the code is by no means optimal and we plan the following improvements. 

1. The memory reqnirement of the code can be significantly rednced if pointers that are used to 
support the tree refinement structure are shared by siblings (descendants of the same parent cell). The 
memory overhead can be reduced even further by incorporating more elaborate data-storage algorithms 
(Khokhlov 1997). The data structures can also be changed to allow for parallelization on distributed memory 
architectures. 

2. The version of the code presented, like any other high-resolution code, is constrained by the condition 
that particles move only a fraction of a mesh cell in a single time step on any of the refinement levels. To 
ensure that this condition is satisfied on the maximum refinement level requires small time steps redundant 
for particles moving on coarser meshes. We are now working on integration scheme with multiple time steps. 

3. We plan to integrate the present A^-body code with a high-resolution Eulerian hydrodynamics code 
(Khokhlov 1997) that works on similar refinement meshes. 

We have used the ART code to study the structure of dark matter halos in two cosmological models - 
standard CDM (U = 1, ft, = 0.5, as = 0.63) and a variant of ACDM (U = 0.3, Oa = 0.7, as = 1.0). We have 
found that halos formed in ACDM model have density profiles similar to halos formed in CDM model. The 
density profiles are well described by the analytical formula (eq.[19]) presented by Navarro et al. (1996a) and 
have cuspy [p(r) oc r~^] structure in the central (r < 10ft“^ kpc) parts of a halo with no indications of a core 
down to the resolution limit of our simulations. The similar model with different parameters {as = 0.7 and 
ft = 0.5), whose are not ideal because of rather small Hubble constant, was independently studied in recent 



- 21 - 


papers by Navarro (1996) and Navarro, Frenk, & White (1996b) with conclusions different from ours. These 
authors conclude that halos formed in ACDM model are less centrally concentrated and are thus more in 
accord with dynamics of dwarf galaxies. Our analysis has shown that the major source of this incosistency 
lies in the definitions of halo radius and concentration parameter c: in the above studies the authors neglect 
the fact that virialization radius in the ACDM model corresponds to overdensity 334 rather than to 200; they 
also normalize their densities to the critical density rather than average density of the universe (as assumed 
in the top hat collapse model). We were able to reproduce their results when we followed their definitions. 
We therefore conclude that halos formed in the ACDM model have structure similar to that of the CDM 
halos and thus cannot explain the dynamics of the central parts of dwarf spiral galaxies inferred from the 
galaxies’ rotation curves. 
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a 



b 


Parent Neighbor 



Child 1 Child 2 


Fig. 1.— Schematic illustration of the tree structure and the pointers used to support it in one dimension, 
(a): Two neighbor cells (“Parent” and “Neighbor”), one of which (“Parent”) is split (it has two children 
marked “Childl” and “Child 2”) and the other of which (“Neighbor”) is a leaf. The arrows denote pointers 
used to support the structure (see text for details). Arrows drawn with dashed lines denote pointers which 
can be shared by all children. Below (b) we show the actual locations of the mesh cells in space. 
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Fig. 2.— Examples of permitted and prohibited neighbor configurations. The prohibited configurations (the 
cells that violate the neighbor rule) are indicated with arrows. Note that a mesh is not allowed to have 
neighbors with level difference greater than 1 but it is allowed to have “corner neighbor” with level difference 

2 . 
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The PS version of this figure can be found at 


http://charon.nmsu.edu/~akravtso/GROUP/group_publications.html 


or at anonymous ftp 


charon.nmsu.edu/pub/kravtsov/PAPERS/ART / 


Fig. 3.— (a) A slice through the refinement structure (base grid is not shown) in one of the ACDM 
simulations with 32^ particles (see §4.4) and (b) the corresponding slice through the particle distribution. 
The area enclosed by the square in (a) is enlarged in Fig.4 
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The PS version of this figure can be found at 


http://charon.nmsu.edu/~akravtso/GROUP/group_publications.html 


or at anonymous ftp 


charon.nmsu.edu/pub/kravtsov/PAPERS/ART / 


Fig. 4.— Region enclosed by the square in Fig. 3a. Note that the mesh generator tends to build almost 
rectangular meshes around dense isolated clumps of particles, while to trace a filament the generator creates 
meshes of arbitrary shape that effectively cover the elongated structures in particle distribution. 
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The PS version of this figure can be found at 


http://charon.nmsu.edu/~akravtso/GROUP/group_publications.html 


or at anonymous ftp 


charon.nmsu.edu/pub/kravtsov/PAPERS/ART / 


Fig. 5.— (a) Timing for ACDM run with 32^ particles on 8 processors of SPP-1200 Exemplar (see §3.8 
for discussion). The total CPU time per step is scaled down by a factor of 2 for convenience, (b) Energy 
conservation error vs. expansion factor a in ACDM runs with 32^ (solid line) and 64^ (dashed line) particles. 
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The PS version of this figure can be found at 


http://charon.nmsu.edu/~akravtso/GROUP/gr oup_publications.html 


or at anonymous ftp 


charon.nmsu.edu/pub/kravtsov/PAPERS/ART / 


Fig. 6.— (a) Pairwise force accuracy of the ART code on the base regular grid (FFT solver) and on the 
second and fourth refinement levels (relaxation solver) vs. interparticle separation, (b) Comparison of the 
force accuracy on the fourth refinement level with the theoretical accuracy of a Plummer softened force, both 
vs. interparticle separation in units of the fourth-level cell length. 



2 



Fig. 7.— The circular motion test. Two particles were placed 2 grid cell lengths apart and were given 
velocities in such a way that in case of a zero force error they would stay on a circular orbit. The dashed 
line shows trajectory (two orbital periods) of two particles moving on the zero PM level (at the distance of 
two base grid cells from each other) and the solid line shows the trajectory of particles moving on the second 
refinement level (at the distance of 2 second-level cells or 0.5 zeroth-level cells). After more than two orbital 
periods, the particles stayed on the nearly circular orbit with maximum error in diameter of about 8%. 
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X X 


Fig. 8.— Plane wave collapse test: phase diagram in Lagrangian coordinates q (a) with 3 refinement levels 
and (b) in the PM simulation with a 32^-cell grid at the crossing time. Solid line, analytic solution; polygons, 
numerical results. The number of polygon vertices is equal to the mesh level plus three, so that triangles 
show particles located on the base grid, squares show particles located on the first refinement level, and so 
on. (c,d) Corresponding phase diagram for physical coordinates x. The Lagrangian coordinates show the 
differences between ART and PM results more clearly, because at the crossing time the v — x phase diagram 
is almost vertical in the central part of the pancake. 
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Fig. 9.— Plane wave collapse test: rms deviations of (a) coordinates and (b) velocities from analytic solution 
vs. the expansion parameter for the PM code (solid line) and for the ART code with three levels of refinement 
(dashed line). Starting from the moment when the first refinement level was introduced (a ~ 0.29), the rms 
deviations of the ART code were lower than those of the PM code 





- 33 - 



X 


Fig. 10.— Spherical infall test: the simulated density profile {filled circles) compared with the analytic 
solution {solid line). The points for the analytic solution are taken from Tables 4 and 5 in Bertschinger 
(1985). The simulated density profile was constructed by estimating the density in concentric spherical shells 
of logarithmically increasing thickness, with the smallest radius corresponding to the maximum resolution. 
The error bars correspond to the Poisson noise. 
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The PS version of this figure can be found at 


http://charon.nmsu.edu/~akravtso/GROUP/group_publications.html 


or at anonymous ftp 


charon.nmsu.edu/pub/kravtsov/PAPERS/ART / 


Fig. 11.— Comparison of projected final distributions of 32^ particles in pure PM runs with (a) 64^-cell and 
(d) 256^-cell grids and in ART runs with two levels of refinement and base grid of (b) 64^ and (c) 128^ cells. 
In the small windows we show subsets of particles belonging to one of the halos (the same halo for all runs). 
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Fig. 12.— Comparison of density distributions (fractions of the total mass in the regions of a given overden¬ 
sity) for PM and ART (two refinement levels) runs. The density distributions for all runs were computed 
after rebinning the density field to the 256^-cell grid. Note that the density distribution functions for both 
ART 64^-|-2L and PM 256^ runs show approximately the same behavior, reaching overdensities of « 2 x lO"*, 
whereas the PM 64^ run fails to produce halos with overdensities greater than « 5 x 10^. 







Fig. 13.— Comparison of density profiles for halos i 
simulations (32^ particles) of comparable resolution 



id in PM {dashed lines) and ART {solid lines) 
^ kpc) with the same initial conditions. 
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Fig. 14.— Solid lines: Density profiles for nine halos of different masses taken from the CDM simulation 
with 64^ particles (resolution ~ 2h~^ kpc). Dotted lines: Best fit by the analytic profile of Navarro et al. 
(1996a). Numbers in each panel indicate the mass of the halo inside a radius corresponding to an overdensity 
of 200. 
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15.— Same as Fig.14, but for ACDM. 
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Fig. 16.— Logarithm of the concentration parameter, c = r 2 oo/^s) vs. logarithm of halo mass M 200 for high- 
resolution (~ 2h~^ kpc) CDM (filled circles) and ACDM simulations [empty circles). The solid curve shows 
the mass-concentration relation predicted from the formation times of halos that best fitted the numerical 
results of Navarro et al. (1996a). 
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Fig. 17.— Effects of resolution on halo density profiles. The profiles of four halos taken from a ACDM run 
with ~ 2h~^ kpc resolution {solid lines) and a run with ~ 7h~^ kpc resolution {dashed lines) are plotted 
down to the resolution limit. 
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Table 1. Code timings for ACDM simulations with 64^ particles on 8 CPUs of the SPP-1200 Exemplar. 


Routine 

Maximum 

Number 

Density 

FFT 

Relaxation 

Particle 

Mesh 

Total 


level 

of mesh cells 

assignment 

solver 

solver 

motion 

modifications 


RUNl CPU (sec) 

4 

4860976 

25.8 

26.2 

54.5 

34.8 

44.0 

185.3 

RUN2 CPU (sec) 

6 

5019136 

30.6 

26.4 

82.7 

40.8 

55.7 

236.2 
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